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DETAILED ACTION 



Specification 

The abstract of the disclosure is objected to for having excessive length. An abstract 
should be a concise description of the claimed invention and contain no more than 150 words. 
Correction is required. See MPEP § 608.01(b). 



Claim Rejections - 35 USC § 102 

The text of those sections of Title 35, U.S. Code not included in this action can be found 
in a prior Office action. 

Claims 1-3, 8, 11-13, and 27-30 are rejected under 35 U.S.C. 102(b) as being 
anticipated by Raman (U.S. Patent 5,748,186). 



Regarding claim 1, Raman teaches retrieving a modality-independent document from 
local storage (taught as the storing of a "common intermediate high-level data structure" stored 
in memory, which is then retrieved by a presenter, at col. 3, lines 6-1 1) where the modality- 
independent document is an intent-based document (made so by the use of forms for 
conducting a dialog with a user, and enabling user-specified transactions, at col. 4, lines 21-28), 
parsing the modality-independent document using parsing rules obtained from local or remote 
storage (taught as the parsing of a source document, which must inherently contain rules for 
parsing on local or remote storage, at col. 4, lines 45-49), converting the modality-independent 
document into a first intermediate representation that can be rendered by a speech user 
interface and converting the modality-independent document into a second intermediate 
representation that can be rendered by a graphical user interface (taught as the conversion of 
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the common intermediate structure into aural information or visual information, at col. 3, lines 
11-13), building a cross-reference table by which the speech user interface can access 
components comprising the second intermediate representation (taught as the ability of a user 
to interact with forms through speech input, at col. 4, lines 21-27. The program must inherently 
contain speech definitions for accessing the graphical interface), rendering the first and second 
intermediate representations in their respective modalities (taught as the use of rendering 
methods to render a document into a specified modality, at col. 7, lines 53-57), and receiving a 
user input in one of the GUI and speech user interface modalities to enable multi-modal 
interaction and control the document presentation (taught as the use of an interactive interface 
to control the document, at col. 3, lines 31-35). 

Regarding claim 2, Raman teaches synchronizing GUI and speech modalities at col. 3, 
lines 21-23. 

Regarding claim 3, Raman teaches storing the first intermediate representation in a local 
system memory for immediate rendering, taught as the storing of a common structure in 
memory, at col. 3, lines 6-8. 

Regarding claim 8, Raman teaches executing an applications program corresponding to 
an event call within the modality-independent document, taught as the execution of a browser to 
interact with links specified in the common intermediate structure, at col. 3, lines 47-51, and 56- 
62. 
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Regarding claim 11, Raman teaches a method for registering a program to be executed 
upon completion of a user-specified event, taught as the use of event methods to bind a 
presentation method and thus an application to a document object for a particular event, at col. 
7, lines 33-38. 

Regarding claim 13, the system of Raman is inherently composed of computer- 
executable instructions and stored on a machine-readable storage device. 

Regarding claim 27, Raman teaches a multi-modal manager for parsing a modality- 
independent document to generate a traversal model that maps components of a modality- 
independent document to first and second modality-specific representations (taught as the use 
of a recognizer to convert a common intermediate structure into visual, aural, or tactile 
information, at col. 3, lines 6-8) where the modality-independent document is an intent-based 
document (made so by the use of forms for conducting a dialog with a user, and enabling user- 
specified transactions, at col. 4, lines 21-28), a speech user interface manager for rendering and 
presenting a first modality-specific representation in a speech modality and a GUI manager for 
rendering and presenting the second modality-specific representation of a GUI modality (taught 
as the use of a presenter for presenting aural and visual information to a user, at col. 3, lines 8- 
16), an event queue monitor for detecting GUI events and an event queue for storing captured 
GUI events (inherently taught as the I/O control of external devices by an operating system), 
and a plurality of methods, called by a speech user interface manager for synchronizing I/O 
events across speech and GUI modalities (taught as the concurrently processed navigational 
methods of col. 5, lines 39-47). 
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Regarding claim 28, Raman teaches the methods for synchronizing I/O events 
comprising a first method for polling for the occurrence of GUI events in the event queue and a 
second method for reflecting speech events back to the GUI manager and posting speech 
events to the multi-modal manager, taught as the use of visible and audible navigational 
methods associated with objects at col. 5, lines 39-47, and combined with the inherent I/O 
queues of an operating system as shown supra. 

Regarding claim 29, Raman teaches a method for invoking user-specified programs that 
are specified in the modality-independent document, taught as the use of event methods to 
bind a presentation method and thus an application to a document object for a particular event, 
at col. 7, lines 33-38. 

Regarding claim 30, Raman teaches a multi-modal manager comprising a main renderer 
that instantiates a GUI manager, a speech user interface manager, and a method for capturing 
GUI events, taught as the use of a recognizer for converting a document into a common 
intermediate structure, which in turn instantiates a presenter for presenting data graphically or 
aurally, at col. 3, lines 6-16, and an interactive interface controlling I/O devices, at col. 3, lines 
30-34. 

Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or 
described as set forth in section 102 of this title, if the differences between the subject 
matter sought to be patented and the prior art are such that the subject matter as a 
whole would have been obvious at the time the invention was made to a person having 
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ordinary skill in the art to which said subject matter pertains. Patentability shall not be 
negatived by the manner in which the invention was made. 

Claims 4-7, 9, 10, 14-26, 32, and 33 are rejected under 35 U.S.C. 103(a) as being 

unpatentable over Raman and Ehsani et al (U.S. Publication 2002/0032564), hereinafter 

Ehsani. 

Regarding claim 4, Raman teaches converting a modality-independent document to a 
first intermediate representation, taught as the conversion of a common intermediate structure 
into aural information or visual information, at col. 3, lines 11-13. 

Raman fails to explicitly teach transcoding a modality-independent document to a 
speech markup script. 

Ehsani teaches the generation of recognition grammars from source pages to be used in 
a speech interface similar to that of Raman, at U 0022 of the disclosure. Ehsani also teaches 
transcoding a modality-independent document to a speech markup script, taught as the 
implementation of a voice page in Voice XML, at U 0231. 

Therefore, it would have been obvious to one of ordinary skill in the art, having the 
teachings of Raman and Ehsani before him at the time the invention was made to modify the 
speech interface of Raman to include the transcoding of a modality-independent document into 
Voice XML of Ehsani in order to obtain a speech user interface with voice scripting capabilities. 

One would be motivated to make such a combination for the advantage of allowing a 
user to interact with an application vocally and defining the vocal interactions in a highly 
structured language such as Voice XML. See Ehsani, U 0230. 

Regarding claim 5, Raman teaches deferred rendering of a speech markup script, taught 
as the user selection of a presentation modality, at col. 2, lines 49-51. 
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Regarding claim 6, Ehsani teaches storing the speech markup script on a local 
persistent storage device, taught as the use of a voice server for storing the voice pages, where 
the voice server is also the apparatus used to implement the Web-based application, or client 
computer, at U 0232. 

Regarding claim 7, Ehsani teaches creating a speech markup script in VXML, at 0231. 

Regarding claims 9 and 10, Ehsani teaches updating existing grammar rules with data 
values returned from the applications program and updating content values associated with a 
component of the modality-independent document using data values returned from the 
applications program, taught as the editing of grammars at U 0245 and listing of option values at 
fl 0244. 

Regarding claim 14, Raman teaches preparing an internal representation of a structure 
and component attributes of a modality-independent document (taught as the parsing of a 
source document and creation of an element tree, at col. 5, lines 1-10) where the modality- 
independent document is an intent-based document (made so by the use of forms for 
conducting a dialog with a user, and enabling user-specified transactions, at col. 4, lines 21-28), 
and presenting an aural description of the modality-independent document in response to the 
spoken request (taught as the "speaking" of a document at col. 4, lines 6-7, in response to the 
speech recognizer control of the presenter module, at col. 3, lines 31-35), where presenting an 
aural description of the modality-independent document comprises providing global help 
information by presenting document components, attributes, and methods of interaction (taught 
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as the aural presentation of document data, which inherently contains components, attributes, 
and methods, at col. 5, lines 21-47). 

Raman fails to explicitly teach building a grammar comprising rules for resolving specific 
spoken requests and processing a spoken request utilizing the grammar rules. 

Ehsani teaches the generation of recognition grammars from source pages to be used in 
a speech interface similar to that of Raman, at U 0022 of the disclosure. Furthermore, Ehsani 
teaches building a grammar comprising rules for resolving specific spoken requests (taught as 
the generation of a grammar through a voice page, at 0237-0246), and processing a spoken 
request utilizing the grammar rules (taught as the use of a voice recognition system to 
understand user statements, at ^ 0248). 

Therefore, it would have been obvious to one of ordinary skill in the art, having the 
teachings of Raman and Ehsani before him at the time the invention was made to modify the 
multiple-modality presentation system of Raman to include the grammar building system of 
Raman, in order to obtain a presentation system capable of constructing document-specific 
vocal grammars. 

One would be motivated to make such a combination for the advantage of document 
portability. See Ehsani, U 0009. 

Regarding claim 16, the system of Raman is inherently composed of computer- 
executable instructions and stored on a machine-readable storage device. 

Regarding claim 17, Raman teaches preparing an internal representation of a structure 
and component attributes of a modality-independent document (taught as the parsing of a 
source document and creation of an element tree, at col. 5, lines 1-10) where the modality- 
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independent document is an intent-based document (made so by the use of forms for 
conducting a dialog with a user, and enabling user-specified transactions, at col. 4, lines 21-28), 
and presenting an aural description of a modality-independent document by presenting 
document components, attributes, or methods of interaction to provide contextual help 
information, taught as the aural presentation of document data, which inherently contains 
components, attributes, and methods, at col. 5, lines 21-47. 

Raman fails to explicitly teach building a grammar comprising rules for resolving specific 
spoken requests and processing a spoken request utilizing the grammar rules. 

Ehsani teaches the generation of recognition grammars from source pages to be used in 
a speech interface similar to that of Raman, at U 0022 of the disclosure. Furthermore, Ehsani 
teaches building a grammar comprising rules for resolving specific spoken requests (taught as 
the generation of a grammar through a voice page, at 1J 0237-0246), and processing a spoken 
request utilizing the grammar rules (taught as the use of a voice recognition system to 
understand user statements, at H 0248). 

Therefore, it would have been obvious to one of ordinary skill in the art, having the 
teachings of Raman and Ehsani before him at the time the invention was made to modify the 
multiple-modality presentation system of Raman to include the grammar building system of 
Raman, in order to obtain a presentation system capable of constructing document-specific 
vocal grammars. 

One would be motivated to make such a combination for the advantage of document 
portability. See Ehsani, 0009. 

Regarding claim 18, Ehsani teaches a method wherein building a grammar comprises 
the step of combining values obtained from data stored local storage or remote storage, with 
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values obtained from an analysis of the modality-independent document, taught as the creation 
of a grammar based on parsing reference page values to create grammar phrases, at U 0237- 
0246. 

Regarding claim 19, the system of Raman is inherently composed of computer- 
executable instructions and stored on a machine-readable storage device. 

Regarding claim 20, Raman teaches preparing an internal representation of a structure 
and component attributes of a modality-independent document (taught as the parsing of a 
source document and creation of an element tree, at col. 5, lines 1-10) where the modality- 
independent document is an intent-based document (made so by the use of forms for 
conducting a dialog with a user, and enabling user-specified transactions, at col. 4, lines 21-28), 
and presenting an aural description of the modality-independent document in response to the 
spoken request to provide feedback information, taught as the "speaking" of a document (col. 4, 
lines 6-7), in response to the speech recognizer control of the presenter module, at col. 3, lines 
31-35. 

Raman fails to explicitly teach building a grammar comprising rules for resolving specific 
spoken requests, processing a spoken request utilizing the grammar rules, and obtaining state 
and value information regarding specified components of the document from the internal 
representation of the document. 

Ehsani teaches the generation of recognition grammars from source pages to be used in 
a speech interface similar to that of Raman, at U 0022 of the disclosure. Furthermore, Ehsani 
teaches building a grammar comprising rules for resolving specific spoken requests (taught as 
the generation of a grammar through a voice page, at U 0237-0246), and processing a spoken 
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request utilizing the grammar rules (taught as the use of a voice recognition system to 
understand user statements, at 0248). Ehsani also teaches obtaining state and value 
information regarding specified components of the document from the internal representation of 
the document, taught as the creation of a grammar based on parsing reference page values to 
create grammar phrases, at 0237-0246. 

Therefore, it would have been obvious to one of ordinary skill in the art, having the 
teachings of Raman and Ehsani before him at the time the invention was made to modify the 
multiple-modality presentation system of Raman to include the grammar building system of 
Raman, in order to obtain a presentation system capable of constructing document-specific 
vocal grammars. 

One would be motivated to make such a combination for the advantage of document 
portability. See Ehsani, U 0009. 

Regarding claim 21, Ehsani teaches a method for building a grammar by combining 
values obtained from data stored in local storage or remote storage, with values obtained from 
analysis of the document, taught as the creation of a grammar based on parsing reference page 
values to create grammar phrases, at H 0237-0246. 

Regarding claim 22, the system of Raman is inherently composed of computer- 
executable instructions and stored on a machine-readable storage device. 

Regarding claim 23, Raman teaches preparing an internal representation of a structure 
and component attributes of a modality-independent document, taught as the parsing of a 
source document and creation of an element tree, at col. 5, lines 1-10, and presenting an aural 
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description of the modality-independent document in response to the spoken request, taught as 
the "speaking" of a document (col. 4, lines 6-7), in response to the speech recognizer control of 
the presentor module, at col. 3, lines 31-35. Raman also teaches presenting each character of 
content value information requested in response to a spoken request, taught as the display of 
visual information (col. 3, lines 11-16) in response to an I/O manipulation of a presentor by a 
speech recognizer (col. 3, lines 31-35). 

Raman fails to explicitly teach building a grammar comprising rules for resolving specific 
spoken requests, processing a spoken request utilizing the grammar rules, and obtaining state 
and value information information regarding specified components of the document from the 
internal representation of the document. 

Ehsani teaches the generation of recognition grammars from source pages to be used in 
a speech interface similar to that of Raman, at U 0022 of the disclosure. Furthermore, Ehsani , 
teaches building a grammar comprising rules for resolving specific spoken requests (taught as 
the generation of a grammar through a voice page, at 0237-0246), and processing a spoken 
request utilizing the grammar rules (taught as the use of a voice recognition system to 
understand user statements, at U 0248). Ehsani also teaches obtaining state and value 
information information regarding specified components of the document from the internal 
representation of the document, taught as the creation of a grammar based on parsing 
reference page values to create grammar phrases, at 0237-0246. 

Therefore, it would have been obvious to one of ordinary skill in the art, having the 
teachings of Raman and Ehsani before him at the time the invention was made to modify the 
multiple-modality presentation system of Raman to include the grammar building system of 
Raman, in order to obtain a presentation system capable of constructing document-specific 
vocal grammars. 



Application/Control Number: 09/876,714 Page 13 

Art Unit: 2173 

One would be motivated to make such a combination for the advantage of document portability. 
See Ehsani, 1J 0009. 

Regarding claim 24, Raman teaches inserting pauses between each character of the 
content value information to be presented, taught as the pausing of the presentation on links to 
enable easier user selection, at col. 4, lines 12-14. 

Regarding claim 25, Ehsani teaches a method for building a grammar by combining 
values obtained from data stored in local storage or remote storage, with values obtained from 
analysis of the document, taught as the creation of a grammar based on parsing reference page 
values to create grammar phrases, at 0237-0246. 

Regarding claim 26, the system of Raman is inherently composed of computer- 
executable instructions and stored on a machine-readable storage device. 

Regarding claims 32 and 33, Ehsani teaches the use of Voice XML in creating voice 
pages from source pages, and allows for the presentation of such pages, at 0232. 

Claim 31 is rejected under 35 U.S.C. 103(a) as being unpatentable over Raman and 
Dietz (U.S. Patent 6,175,820). 

Raman has been shown to teach a speech user interface manager at col. 3, lines 8-16. 

However, Raman fails to explicitly teach the use of JSAPI in the speech user interface 
manager. 
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Dietz teaches a system for enhanced computerized speech communication, which 
utilizes JSAPI to interface with the user (col. 1-2, lines 65-67 and 1-2. 

Therefore, it would have been obvious to one of ordinary skill in the art, having the 
teachings of Raman and Dietz before him at the time the invention was made to modify the 
speech user interface of Raman with the JSAPI capabilities of Dietz in order to obtain a speech 
user interface with JSAPI speech synthesizers. 

One would be motivated to make such a combination for the advantage of the 
synthesizer output control given by JSAPI and the related Java Speech Markup Language (col. 
2, lines 3-9). 

Response to Arguments 

Applicant's arguments filed 17 November 2004 have been fully considered but they are 
not persuasive. 

With respect to applicant's argument that Raman fails to teach an "intent-based" 
document, the examiner respectfully disagrees. As noted above Raman teaches forms for 
conducting a dialog with a user, and enabling user-specified transactions, which qualify the 
modality-independent documents of Raman as also being intent-based. 

Subsequently, applicant's argument with respect to the combination of Raman and 
Ehsani is moot. 



THIS ACTION IS MADE FINAL. 

set forth in 37 CFR 1.136(a). 



Conclusion 

Applicant is reminded of the extension of time policy as 



Application/Control Number: 09/876,714 



Page 15 



Art Unit: 2173 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within TWO 
MONTHS of the mailing date of this final action and the advisory action is not mailed until after 
the end of the THREE-MONTH shortened statutory period, then the shortened statutory period 
will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 
CFR 1 .136(a) will be calculated from the mailing date of the advisory action. In no event, 
however, will the statutory period for reply expire later than SIX MONTHS from the mailing date 
of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner 
should be directed to Michael Roswell whose telephone number is (571) 272-4055. The 
examiner can normally be reached on 8:30 - 6:00 M-F. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, John Cabeca can be reached on (571) 272-4048. The fax phone number for the 
organization where this application or proceeding is assigned is 703-872-9306. 

Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private 
PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 



Michael Roswell 
4/13/2005 




